Thanks to Joel Dreessen from MDE for helping me make sense of this monitoring data!

Purpose: to make a graphic describing ozone activity at all monitoring sites in Maryland over a 36 hour period (midnight 10/28/19 to noon 10/29/19).

1. Load packages.

library(tidyverse)
library(lubridate)
library(here)

2. Make sure you’re working in your designated project folder.

If you have an R Project and your data in the same folder as this script, you should be in good shape, but it’s always good to check! This is the only time we need to use the here package.

here()
## [1] "P:/Webinars/2019 Webinars/2019--12Dec3-19--R Training, Jenny St. Clair/Materials/RTraining_JS_IN_PROGRESS/RTraining/Webinar1"

3. Read in data.

Data source: https://www.airnowtech.org/

data <- read.csv("MD_o3_1029_tidy.csv")

4. Some data prep (we’ll dive into specifics later but for now just copy, paste, and run.)

data <- data %>% select (-c(X,time))

data$Site.AQS.split <- as.character(data$Site.AQS)
data<- data %>% separate(Site.AQS.split, c("State.FIPS", "County.AQS"), 2)
data <- data %>% separate(County.AQS, c("County.FIPS", "Site"), 3)

data$Site.AQS <- as.factor(data$Site.AQS)
data$State.FIPS <- as.factor(data$State.FIPS)
data$County.FIPS <- as.factor(data$County.FIPS)
data$Method <- as.factor(data$Method)

data <- data %>% filter(State.FIPS == "24")


data$date <- data$date %>% 
  as_datetime() 

str(data)
## 'data.frame':    724 obs. of  8 variables:
##  $ o3         : int  30 31 33 33 32 29 29 29 33 36 ...
##  $ Agency     : Factor w/ 5 levels "MD1","NJ1","NY1",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Site.AQS   : Factor w/ 68 levels "240031003","240051007",..: 13 13 13 13 13 13 13 13 13 13 ...
##  $ Method     : Factor w/ 2 levels "47","87": 1 1 1 1 1 1 1 1 1 1 ...
##  $ date       : POSIXct, format: "2019-10-28 00:00:00" "2019-10-28 01:00:00" ...
##  $ State.FIPS : Factor w/ 3 levels "24","34","36": 1 1 1 1 1 1 1 1 1 1 ...
##  $ County.FIPS: Factor w/ 37 levels "001","003","005",..: 13 13 13 13 13 13 13 13 13 13 ...
##  $ Site       : chr  "9001" "9001" "9001" "9001" ...

5. Make a scatterplot.

The two main functions necessary for a ggplot graphic are the ggplot() and geom_point() functions (connected by a “+”). Data is fed into the ggplot(). There must be an aes() function (short for aesthetic) inside one of these, and this is where you specify which variables you’d like to examine. Just like in cartesian coordinates, aes() will look for x and then y.The geom_…() function specifies the geometry we’d like to use. Other commonly used geometries are geom_col() and geom_line(). This plot tells us the extent of our data, which is helpful, but we need to know more. This dataset also includes the site number that each ozone value was recorded at, as well as the method used to record it.

TLDR: ggplot requires data fed into ggplot(), a geometry specified by geom_…(), and x/y variables specified inside aes(), which goes inside either ggplot() or geom_…().

data %>% 
  ggplot(aes(x = date, y = o3)) +
  geom_point() 

6. Add color.

If we want to see which point came from which site, we can do so by using color. This means adding another variable aesthetic, so it goes inside an aes() function.

Side note: the last argument on this plot adds a theme to the plot. There are many different themes you can use, and I HIGHLY recommend using one anytime you share a plot. It is an easy way to make your work look sharp. I recommend theme_minimal(), theme_light(), theme_bw(), or theme_void(). Be sure to take a few minutes to try these out and pick your favorite! Looks matter.

data %>% 
  ggplot(aes(x = date, y = o3)) +
  geom_point(aes(color = Site.AQS)) +
  theme_bw()

Exercise 1: change from point to line

Adding color to the points is helpful but it’s hard to see what’s going on at each monitor. Change the geometry from geom_point to geom_line in the last chunk of code.

data %>% 
  ggplot(aes(x = date, y = o3)) +
  geom_point(aes(color = Site.AQS))+
  theme_bw()

7. Add facet_wrap()

Now the plot shows individual monitor changes over time, but it’s too busy to make sense of it. We’ll add facet_wrap(~Site.AQS) to the plot.

data %>% 
  ggplot(aes(x = date, y = o3)) +
  geom_line(aes(color = Site.AQS)) +
  theme_bw()+
  facet_wrap(~Site.AQS)

8. Rotate labels, remove legend.

There are a number of issues here. The worst one is the labels on the x-axis. Let’s rotate those by 90 degrees. Another one is the legend which is redundant with the grid headers. So let’s ditch that too.

data %>% 
  ggplot(aes(x = date, y = o3))+
  geom_line(aes(color = Site.AQS))+
  theme_bw()+
  facet_wrap(~Site.AQS)+
  theme(axis.text.x = element_text(angle = 90), legend.position = "")

9. Increase the line size.

data %>% 
  ggplot(aes(x = date, y = o3))+
  geom_line(aes(color = Site.AQS), size = 1.2)+
  theme_bw()+
  facet_wrap(~Site.AQS)+
  theme(axis.text.x = element_text(angle = 90), legend.position = "")

10. Add a title, fix up x and y labels.

data %>% 
  ggplot(aes(x = date, y = o3))+
  geom_line(aes(color = Site.AQS), size = 1.2)+
  theme_bw()+
  facet_wrap(~Site.AQS)+
  theme(axis.text.x = element_text(angle = 90), legend.position = "")+
  labs(title = "Hourly Ozone at Maryland Sites over 36 Hours", x = "Date/Time", y = "Ozone PPB")

Exercise 2:

1. Change the line color to Method instead of Site.AQS (this is case-sensitive).

2. Add a subtitle that says “Method 47 = Ultraviolet Photometry, Method 87 = Ultraviolet Radiation Absorption”.

3. Bring the legend back.

Hint: you can add the subtitle inside the labs() function.

Hint: Look back to where we got rid of the legend.

data %>% 
  ggplot(aes(x = date, y = o3)) +
  geom_line(aes(color = Method), size = 1.2) +
  theme_bw()+
  facet_wrap(~Site.AQS)+
  theme(axis.text.x = element_text(angle = 90))+
  labs( title = "Hourly Ozone at Maryland Sites over 36 Hours", 
        subtitle = "Method 47 = Ultraviolet Photometry, Method 87 = Ultraviolet Radiation Absorption")

Homework 1

1. Open the R Markdown script in the Homework 1 folder. The script in the folder works on the Iris dataset that comes with R.

2. Change it to use the same data we used here. The data prep is included.

3. Let the histogram show the frequency of different ozone values in the dataset (ozone on the x-axis, no y-axis variable necessary)

4. Remove the legend.

For fun:

1. Use the function plotly::ggplotly() to make the graphic interactive.

https://plot.ly/ggplot2/

Remember, when you run into errors and weird results, don’t spend too much time scratching your head. If you can’t think of a solution after a couple minutes, google it! StackExchange/StackOverflow is your friend too. If you are really stuck, shoot me an email.

https://ggplot2.tidyverse.org/reference/geom_histogram.html

plot <- data %>% 
  ggplot(aes(o3)) +
  geom_histogram(aes(fill = Site), color = "black") +
  theme_bw()

library(plotly)
ggplotly(plot)